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1, INTRODUCTION 

Historical documents are often valuable and need to be protected and well preserved. Damage to any 
original historical documentation, or rare texts, can cause degradation and result in deterioration. Libraries, 
archivists, and those who have stewardship over precious texts, and in preserving and protecting them, often 
find that circulating these original treasures almost becomes impossible. In recent years, digital archiving has 
been the optimal and preferred solution for retrieval and storage of these valuable archives. Historical 
document image analysis requires several steps to be performed, consisting of layout analysis, followed by 
text line, word segmentation and finally optical character recognition (OCR) [1]. Binary image representation 
is said to be the most preferred image format, and the process of obtaining a binary image is called 
binarization [1]. 

However, many problems are associated with historical documents as they tend to consist of 
handwritten and machine printed documents. Also, [2] handwritten documents are more difficult to process 
compared to machine printed documents. One of the main reasons for this is that these documents lack a 
specific structure and typically the style of writing is exclusive to a particular individual. Furthermore, the 
characters may be attached or connected depending on the style of the calligraphic writing. Another reason 
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associated with handwritten documents is that many of the documents are written using pen quills. In this 
instance, several degradations can occur, for example, large ink stains that bleed through the paper and as 
well as producing faint characters. As stated in [3], historical documents are more complex to binarize 
compare to recent documents due to some factors like color, paper aging, stains, transludicity, texture and 
many more. 

The literature in this field of research has revealed that the binarization algorithm consists of two 
groups [4]. The first group is using the thresholding method. Threshold value is obtained from an algorithm 
that separates text from non-text. Some of the earlier work carried out by researchers was by Otsu [5], Kittler 
et al. [6], Li & Lee [7], Niblack [8], Sauvola [9] and many more. The thresholding method includes global 
and local thresholding, although one of the weaknesses in the global thresholding method is that the approach 
fails when background of a particular image is various or the patterns of certain images have heterogeneous 
background. In other words, the global thresholding method cannot be adjusted with varying illumination 
images and do not work well in low-quality images [10]. While for local thresholding, the main weakness is 
challenging to estimate the parameters used for the algorithms. Likewise, they work poorly on images with a 
high degree of variability [11]. 

The second group category is the binarization method that used classification approach which 
allows clustering and selecting features in order to produce binarization [10]. However, the drawback of this 
method is that it needs a training set to improve the binarization performance and the results depend mainly 
on the quality of the training set [10]. Although, many low-level features can be used for training each pixel. 
Some of the features use grey level intensities [12], gradients [13], red, green and blue (RGB) [14] and 
many more. 

Therefore, the main objective of this work is to compare two descriptors used in the classification of 
each pixel in document binarization; RGB colour intensities and grey level intensities. Intensities in a 3 by 3 
window are used as features to determine each pixel as text or non-text. To classify the pixels, Support 
Vector Machine (SVM) learning is used. 

The remainder of this the paper is organised into the following sections. Section 2 presents and 
discusses the literature review. Section 3 describes the proposed method which is followed by Section 4 
which discusses the experimental results. Lastly, Section 5 presents the overall findings of this work 
and conclusions. 


2. RELATED WORKS 

A lot of studies have been done in the area of historical document image binarization using a 
supervised approach. One study by [1] used hierarchical deep supervised network (DSN) architecture in the 
classification of the text’s pixels at two groups of feature levels, namely; high level and low-level features. 
Low-level features were used to obtain foreground maps, where at the boundary area the visual quality was 
found to be much better. Whereas, the high-level features enabled differentiation between the foreground and 
background which was shown to manage severe degradations quite well. Degraded document images have 
been binarized by [4] using a structured classifier; a conditional random field (CRF). Markov random fields 
generated conditional probability distribution from the binarized image, and the CRF then modelled the 
probability. Also, the training phase was used to estimate the model parameters. The final results of the 
binarization output were lastly chosen by looking at the most probable binary image. The binary image 
chosen 1s the one that give maximum accuracy in trained model. 

Next, the researcher in [15] proposed a learning-based binarization method, claiming that the 
proposed method could increase the accuracy of the binarization method for documents of the same type to 
stabilise the quality. At the stage of learning, knowledge of binarization evaluation and optimisation were 
first obtained. Then, the results of the binarization were used as input towards the binarization process again. 
This process was performed so that the binarization parameters could be adjusted and the process increased 
the binarization accuracy. 

Learning framework for obtaining optimised parameter values was proposed by [16] to cater for any 
binarization method. The framework consists of two sections; learn and apply. In this framework, a ground 
truth dataset is used for learning process with three consecutive steps to extract the features, obtaining the 
optimal parameter from the training images and using support vector regression to perform the classification. 
The researcher also used support vector regression (SVR) as structural risk can be minimised due to SVR’s 
intrinsic ability. In another study, Su et al. [17] proposed document image binarization using a self-training 
approach. In this approach, pixels were grouped into three classes, 1.e. foreground, background and uncertain 
pixels. Then a classifier was used to classify the foreground and background pixels after learning from the 
document image pixels. Uncertain pixels were then classified by the rules generated from the learned pixels. 
On the other hand, Xiong et al. [18] used SVM to segment text from degraded document images applying 
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different global thresholds for a different background where the images were divided into w X w regions. 
Local contrast enhancement was used in order to pre-process every image block. This was then followed by 
using SVM to obtain a threshold value for every block. The process was conducted for the entire image using 
a locally adaptive thresholding method. In another study by [19], they have proposed to binarize musical 
documents using convolutional neural network. They have used patches which contain region that surround 
target pixels as descriptors. They have done several modifications by testing number of layers, filters, size of 
kernels, activations type and number of dropout units for convolutional neural network classifier. The main 
advantage of using learning framework 1s its ability to be generalizable. 


3. PROPOSED METHOD 

Many researchers have conducted binarization using a supervised approach as it has been proven to 
be quite successful in segmenting several images, for example; documents, retinal blood vessels and more. In 
this study, the method used a classifier to classify the features used for every pixel in the image. As various 
features can be used to classify a pixel, in this work, the RGB values and grey level values were compared as 
a representative for a particular pixel. Next, the neighbourhood using 3 x 3 window was chosen to represent 
the pixel for each feature. This is illustrated in Figure 1 for the RGB and grey level values. 





Figure 1. Neighbourhood of 3 x 3 window for every target pixel 


Where I = target pixel; R = intensity values for red; G = intensity values for green; B = intensity 
values for blue; and P= intensity values for grey level. 


Each pixel was classified as text for label | or non-text for label 0. Figure 2 shows the RGB values 
used as descriptors for a target pixel. Figure 3 shows the value of intensities in the grey level used as 
descriptors for a target pixel. The value for the label was the value of the target pixel in the respective ground 
truth image. 
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Figure 2. RGB representation 








Figure 3. Grey level representation 
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Initially, the classifier was trained by SVM and used ground truth images to provide the correct 
label for each pixel. Since this is a classification by pixel, the redundant rows needed to be identified as most 
of the cases that happened. These cases were then removed in order to reduce the complexity and time. Then, 
8000 sets of data were selected at random for training. From these 8000 data, 4000 data were labelled as 1, 
and 4000 were labelled as 0. Then to classify the pixels, LIBSVM was used in the experiment [20]. Radial 
basis function kernel 1s used in LIBSVM and hence, its’ parameter values C and y need to be determined. At 
first, data were trained across 5-folds cross-validation. Once the values C and y were obtained, these values 
were then used to create a model for the trained data. To test the data, images used for the training session 
were excluded. Figure 4 illustrates the steps of the approach employed in this work. 


4. EXPERIMENT AND RESULT 

The standard databases HDIBCO2014 [21], DIBCO2012 [22] and DIBCO2016 [23] were used to 
compare these two features which consisted of historical document images with their respective ground truth. 
The databases are freely available to the public and have been used by many researchers. HDIBCO2014 
consisted of 10 handwritten images where three images were used for training while another six images were 
used for testing in the experiment. DIBCO2012 consisted of 14 images and DIBCO2016 consisted of 10 
images. Images in DIBCO2012 and DIBCO2016 used as test images in the experiment and all these images 
used model constructed from training images in HDIBCO2014. The ground truth of these images was 
available and used for a training session to develop the correct model by the classifier. 

To evaluate the binary images obtained, several metrics were used. These metrics have also been 
used by many researchers to quantify the binarization procedure [24-25]. The metrics included the F- 
measure, Peak Signal Noise to Ratio (PSNR) and Negative Rate Metric (NRM). The pixels of the final image 
were then classified as text or non-text. Then the following expression was applied: 


2xrecallxprecision TP ics TP 
——————- where recall = , precision = 
recall+precision TP+FN TP+FP 








(1) 


F — measure = 


Where TP is true positive, FP is false positive, and FN is false negative. The F-measure shows the 


accuracy in percentage of the obtained binary image. 
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Figure 4. Steps of the proposed method 
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PSNR determines how good a given image is similar to another image, and thus, a higher value of 
PSNR indicates a higher similarity between the two images. I and /, are the two images; ground truth image 
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and a final binary image were obtained. M and N are the height and width and C is the difference between the 
foreground and background. 


FP 


NRent+NRep 
FP+TN 


NRM = (3) 


where NRey = —— and NRrp = 

In this expression (3), NRM shows the mismatches between the final binary image and the ground 
truth image and thus, a lower value of NRM indicates a higher similarity between the two images. 

The results obtained from the calculations are shown in Table 1. For HDIBCO2014, the F-measure 
value and NRM for the grey level were found to be much better compared to RGB. The F-measure for grey 
level values was 55.52 compared to 52.65 for RGB. This can also be seen from the NRM values, however, 
the NRM values for the grey level were much lower compared to RGB. Similar results also were obtained for 
DIBCO2012 and DIBCO2016. F-measure for grey level values was 44.42 compared to 42.13 for RGB for 
images in DIBCO2012. Whereas, for DIBCO2016, F-measure for grey level values is only slightly higher 
compared to RGB values. This is supported by NRM values by having smaller values for grey level 
compared to RGB for both datasets, DIBCO2012 and DIBCO2016. By observing the results of every image, 
some images were better if RGB was used, and some images were not. However, PSNR for the grey level 
was lower than RGB in HDIBCO2014 and DIBCO2016. This is due to trade off that occurs when some 
resultant images are better when using RGB compared to the grey level. Figures 5, 6 and 7, show the 
binarization results for the image HO9.png in DIBCO2012, HO9.png in HDIBCO2014 and 10.tif in 
DIBCO2016 respectively. Figures 5 and 7, shows that better resultant images were obtained when grey level 
values were used. Whereas, in Figure 6, better resultant images were obtained when RGB values were used. 


Table 1. Results of Average F-Measure, PSNR and NRM Values with Standard Deviation for HDIBCO2014, 
DIBCO2012 and DIBCO2016 


Database RGB Grey Level 
F-measure PSNR NRM F-measure PSNR NRM 
HDIBCO2014 52.65+13.18 10.3242.58 0.26+0.09 55.52+14.96 10.214+2.53 0.22+0.09 
DIBCO2012 42.13417.29 8.72+4.33 0.26+0.10 44 42+11.36 9.41+2.20 0.21+0.07 


DIBCO2016 56.04+ 12.47 11.27+4.30 0.22+0.07 596.1229.79 10.9143.54 0.19+0.07 





(C) (d) 


Figure 5. Binarization results for HO9.png in DIBCO2012 (a) original image (b) ground truth image (c) 
binarization result for RGB with F- measure = 35.43361 (d) binarization result for grey level with F-measure 
= 57.92860 
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(Cc) (d) 
Figure 6. Binarization results for HO9.png in HDIBCO2014 (a) original image (b) ground truth image (c) 
binarization result for RGB with F- measure = 48.81100 (d) binarization result for grey level with F-measure 
= 45.37836 





Figure 7. Binarization results for 10.bmp in DIBCO2016 (a) original image (b) ground truth image (c) 
binarization result for RGB with F- measure = 36.04240 (d) binarization result for grey level with F-measure 
= 41.23519 


5. DISCUSSION AND CONCLUSION 

The experimental results demonstrated that RGB values and grey level values could be used as a 
descriptor for a particular pixel. For the overall conclusion, the F-measure and NRM values were found to be 
better when using grey level values compared to RGB. By observing the original image, the colour of the 
image did not show varieties in colour and was more towards a grey level image. Hence, it was not suitable 
to use RGB values as descriptors for every pixel. Furthermore, by using grey level values, the algorithm 
simplifies, and computational requirements are reduced. Whereas, the RGB colour introduces unnecessary 
information and the amount of training data increases in order to achieve better performance. Also, the RGB 
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values do not have colour difference sensitivity and cannot measure small colour difference [26]. One of the 
advantage of the proposed method 1s that it 1s easy, can use same model for different database if used for 
same type of images and competent of obtaining good results. However, this method consumes much time, 
and the ground truth images will need to be used for training the models. For future study, it 1s proposed to 
use other types of colour features, for example, HSV and CIELAB and to observe their impact on the 
binarization of historical documents. 
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